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^ . Abstract 

, When testing a large number of independent liypotlieses, tliree different questions are of 

interest: are some liypotlieses true alternatives? How many of them? Which of them? 
These questions give rise to a detection, an estimation, and a selection problem. Recent 
work demonstrates the existence of intrinsic bounds in these problems: detection and 
estimation boundaries in sparse location models, and criticality for the selection problem. 
' We study consequences of such limitations in terms of power of False Discovery Rate (FDR) 

■"sj" , controlling procedures. FDR is the expected False Discovery Proportion (FDP), that is, the 

' expected proportion of false rejections among all rejected hypotheses. 

For the selection problem, we illustrate the connection between criticality and the 
CO ' regularity of the distribution of the test statistics, and discuss expected and observed con- 

, sequences of criticality in terms of power of FDR controlling procedures, on both simulated 

and real data. For the problem of estimating the fraction of true null hypotheses, we make 
explicit connections between the parameters of the multiple testing problem and consis- 
tency and convergence rates of a broad class of non-parametric estimators, and prove that 
, these convergence rates determine that of the FDP achieved by "plug-in" multiple test- 

' ing procedures, which are incorporateing such an estimator in order to yield tighter FDR 

, control. 

Keywords: Multiple testing. False Discovery Rate, Benjamini Hochberg's procedure, 
power, criticality, proportion of true null hypotheses. 
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1. Introduction 

Multiple simultaneous hypothesis testing has become a major issue for high-dimensional 
data analysis in a variety of fields, including non-parametric estimation by wavelet methods 
in image analysis, functional magnetic resonance imaging (fMRI) in medicine, source de- 
tection in astronomy, or DNA microarray analysis in genomics. Given a possibly large set 
of observations corresponding either to a null hypothesis Ho , or an alternative hypothesis 
Hi, several questions are of interest: 

1. a detection problem: are there any true alternatives? 

2. an estimation problem: how many hypotheses are true alternatives? 

3. a selection problem: which hypotheses are true alternatives? 

These three problems have been studied in the framework of mixture models: a p- value 
of the test of the null hypothesis Hq against the alternative Hi is associated with each 
observation, and the distribution of these p-values is modeled as a mixture of a null and 
an alternative distribution. 

The detection and the estimation problem can be viewed as standard testing and 
estimation problems. The originality of recent contributions (Abramovich et al., 2006; 
Cai et al., 2007; Donolio and Jin, 2004, 200G; Jin and Cai, 2007; Meinshausen and Rice, 
2006; Jin, 2008) comes from the fact that they focus on sparse mixture models, in which 
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the fraction of true alternatives tends to 0, and the dissimilarity between the distributions 
under Hq and "Hi increases as the number m of tested hypotheses tend to +00. 

The selection problem is by nature less standard: as it involves testing a large number 
of hypotheses, it requires the definition of appropriate risk measures. The False Discovery 
Rate (FDR) introduced by Benjamiiii and Hochljerg (199-")) has become the most popular 
of these measures. FDR is the expected False Discovery Proportion (FDP), that is, the ex- 
pected proportion of erroneous rejections among selected hypotheses. A related quantity is 
the Positive False Discovery rate (pFDR), that is, the conditional expectation of FDP given 
that at least one discovery has been made. Bc^njamini and Hoclil)erg (1995) proved that the 
so-called BH95 procedure controls FDR at any desired level in [0, 1]. The selection problem 
has mostly been studied in dense (that is, non sparse) situations where all model parame- 
ters remain fixed as m — >■ -l-oo Bcnjamini and Hochberg (1995); Gcnovese and Wasserman 
(2002, 2004); Storey (2002). 

1.1 Intrinsic bounds in multiple comparison problems 

A natural question is whether there exist constraints on the performance of a given pro- 
cedure for the detection, estimation or selection problem, or intrinsic limits to these three 
problems. 

Detection. The detection problem consists in testing the null hypothesis that the pro- 
portion of true null hypotheses is 1, against the alternative that it is smaller than 1. Intrinsic 
bounds to the detection problem have been characterized in the case of sparse Gaussian 
mixtures (Ingster, 1997, 1999; Jin, 2002; Ingster and Suslina, 2003) : a sharp detection 
boundary separates situations in which the Likelihood Ratio Test (LRT) asymptotically 
almost surely correctly detects, from situations in which it asymptotically almost surely 
fails to detect. Donoho and Jin (2004) proved that a procedure named higher criticism, 
originally proposed by Tukcy (1976), achieves quasi-optimal detection boundary for a few 
specific sparse location models, including Gaussian mixtures. 

Estimation. Focusing on sparse Gaussian mixtures, Cai et al. (2007) proved that the 
region where the detection problem can be solved coincides with the region where the frac- 
tion of true alternatives can be consistently estimated. They derived minimax convergence 
rates in this region, and proposed an estimation procedure that achieve the optimal rate. 
Meinsliausen and Rice (2006) focused on a family of estimators and derived the correspond- 
ing estimation boundary; their results are valid for any sparse mixture. 

In the case of Gaussian mixtures, Jin (2008) suggested to estimate the fraction of true 
null hypotheses using the Fourier transform of the characteristic function of the p-values, 
and proved the consistency of a family of such estimators in the sparse and non sparse 
situation. 

Selection. For the selection problem, Chi (2007a) demonstrated the existence of a pos- 
sibly positive lower bound below which no multiple testing procedure can control pFDR. 
In such "critical" situations, the power of the BH95 procedure converges to in probability. 
Criticality is not specific to pFDR. Other risk measures in multiple testing problems have 
the same kind of intrinsic limitations, for example the positive False Discovery Excessive 
Probability (pFDEP), that is, the conditional expectation that the FDP exceeds a given 
threshold (Chi and Tan, 2008). 

In applications it is common for the p- values to have been generated by a set of longi- 
tudinal observations. For example, in genomic studies one typically tests for the equality 
of gene expression levels across two groups of samples, and one p- value is generated for 
each of m genes of interest. The infiuence of longitudinal sample size — the number of data 
points used to generate each p- value — on criticality has been studied by Chi (2007b). 
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1.2 Estimation, selection, and power of FDR controlling procedures 

Although the concept of FDR control was introduced for the selection problem in the dense 
mixture model, FDR and the BH95 procedure have also been successfully applied to sparse 
settings, by Donoho and Jin (2004) for the detection problem, and by Alnamovich and Bcnjamini 
(1995) and Donoho and Jin (200G) for the selection problem, in which it was demonstrated 
to satisfy remarkable minimax properties. 

In this paper, we study how the settings of a multiple comparison problem induce 
limitations to the estimation and selection problems, and how these limitations translate 
in terms of power of FDR controlling procedures. We consider a dense setting, where the 
proportion of true null hypotheses is positive and fixed, and the distribution of the p- values 
under the null and alternative are fixed as well. 

Organization of the paper. Section 2 provides background and notation. In Section 
3 we give theoretical interpretation and illustration of criticality. and discuss expected 
and observed practical consequences of criticality in terms of power of FDR controlling 
procedures. In section 4 we analyze the problem of estimating the fraction ttq of true null 
hypotheses based on observed p- values near 1. We study how convergence properties of the 
FDR achieved by plug-in FDR controlling procedures based on as estimator ttq of ttq are 
determined by convergence properties of ttq, which are in turn determined by regularity 
properties of the distribution of p- values near 1 . 

2. Background and notation 
2.1 Model 

Testing one hypothesis. As we are interested in applications such as microarray 
data analysis in which each observation is the result of a test based on longitudinal data, 
we explicitly model one observation as a realization from a test statistic X. We assume 
that X is distributed as Fq under the null hypothesis T-Lo and as Fi under the alternative 
hypothesis "Hi, and denote by fo and /i the corresponding density functions. This testing 
problem may be formulated in terms of p-values than test statistics. The p- value function 
is defined as p"*" : x i— >■ P^^^ {X > x) for one-sided tests and : x t-^ F-^^ {\X\ > \x\) for 
two-sided tests. By definition the p- values are uniform on [0, 1] under Ho; their distribution 
functions under Tii are derived in the next two Propositions. 

Proposition 1 (One-sided p-value) The one-sided p-value at observation x € M. may be writ- 
ten as p'^{x) = 1 — Fo{x). The corresponding distribution function Gi and density under the 
alternative hypothesis Jii are respectively given by 



Gtiu) 




(1) 



(2) 



for any u G [0, 1] . 



For two sided tests we will assume that F^ is symmetric, that is, that we have 



Vx e M,Fo(x) -fFo(-a;) = 1, 



or, equivalently that 



Va:eM,/o(-x) =/o(x). 
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Proposition 2 (Two-sided p-value) Assume that Fq is symmetric. Then the two-sided p-value 
at observation a; g R may he written as p^{x) = 2 (1 — i^od^;])) • The corresponding distribution 
function under the alternative hypothesis Hi is given by 

G^iu) = l-Fi{F„-\l-u/2))+Fi{F„-\u/2)) (3) 
^1^^^) ^ m(^o"'(l-V2)) + |(F-iK2))) . (4) 

for any u G [0, 1] . 

Corollary 3 (One and two-sided p-values) // Fq is symmetric, then the distribution function 
and density function of one- and two-sided p-values under the alternative hypothesis "Hi are connected 
by: 

G±(u) = G+(u/2) + l-G+(l-w/2) (5) 
5fH - \{gt{ul2) + gt{l-u/2)) (6) 

Testing several hypotheses: conditional mixture model We assume that m 
tests are performed as described in the preceding paragraph. For i G {1...to}, we let 
Yi = if hypothesis i is drawn from the null hypothesis "Ho, and 5^; ~ 1 if it is drawn 
from the alternative "Hi; Xi denotes the corresponding test statistic. Following Efron et al. 
(2001); Gcnovcse and Wasscrman (2002, 2004); Storey (2003), we assume that the random 
variables {Xi, yi)i<i<m are identically independently distributed: Yi is a Bernoulli random 
variable with success probability 1 — ttq, where ttq is the unknown proportion of true null 
hypotheses; the conditional distribution of Xi given Yi is Fi if = 1 and Fg if Yi = 0. 
The marginal distribution of each Xi is thus 

F = ^o-Fo + (1 - ^o)-Fi , 

and we denote by / = ttq/o + (1 — 7ro)/i the corresponding density. We denote by Gi and 
gi the cumulative distribution function and the probability functions of the p-values under 
T-Li. we have Gi = Gi and gi = g^ (as given by Proposition 1) if one-sided p-values are 
calculated, and Gi = G^ and gi = gf (as given by Proposition 2) if two-sided p-values are 
calculated. The p-values are uniform on [0, 1] under T-Lq, we denote by Gq the corresponding 
cumulative distribution function, which is the identity function. The marginal distribution 
function and density of the p-values under the mixture are given by G = ttoGq + (1 — 7ro)Gi 
and g = TTo + (1 - 7ro)gi. 

In this setting, the number mg (m) of true null hypotheses for a given total number m 
of hypotheses tested is a random variable, which verifies E [TOo(TO)/m] = ttq. In order to 
alleviate notation, we will assume without loss of generality that mo(m)/Tn — ttq for any 
m. 

Settings. We will assume that Gi is concave. As G is an affine transform of Gi, this 
is equivalent to assuming that G is concave. For one-sided p-values, this is also equiva- 
lent to assuming that the likelihood ratio ^ of the test statistics is non-decreasing (by 
Equation (2)), that is, that T-Li dominates Hq. 

Condition 1 (Concavity) Gi is concave. 

When studying two-sided p-values we will assume that the distribution function of the 
test statistics under 'Ho is symmetric: 

Condition 2 (Symmetry) 

yx eR,Fo{x) + Fo{-x) = 1. 



5 



p. Neuvial 



2.2 False Discovery Rate control 

The concept of False Discovery Rate (FDR) has been introduced by Benjamini and Hochberg 
(199-^)) in the context of the selection problem. Given a positive rejection threshold t for Ho, 
let R denote the total number of rejections, and V{t) the number of illegitimate rejections 
at t among m tested hypotheses. The False Discovery Proportion at threshold t is defined 
by FDP(t) — j^^^y ^ ; and the corresponding False Discovery Rate is 

FDR(t) =E[FDP(t)] . 

A related quantity is the positive false discovery rate (pFDR), that is, the conditional 
expectation of FDP given that at least one discovery is made: 

pFDR(i) ""'^ 



G{t) 



FDR and pFDR are tightly connected as we have FDR(t) = pFDR(<)P(i?(t) > 0). In partic- 
ular they are asymptotically equivalent for procedures with fixed rejection regions because 
f{R{t) > 0) ^ 1, as shown by Storey ct al. (2004). 

The BH95 procedure. Benjamini and Hochberg (1995), elaborating on previous work 
by Simes (1986), proposed a simple procedure (henceforth denoted by the BH95 procedure) 
to control FDR. Suppose we wish to control FDR at level a, and let P(i) < . . . < P{m) be 
the sorted p-values. Now let be the largest index k such that 

k 

P{k) < OL — 

m 

If there is such an index, then all hypotheses with p- values smaller than Tm = alm/fn are 
rejected. Otherwise, no rejection is made. The BH95 procedure provides strong control of 
the FDR (Benjamini and Hochberg, 1995): 

FDR(?„) <^oa- 

Figure 1 illustrates the application of the BH95 procedure with a = 0.2 to m = 100 
simulated hypotheses, among which 20 are true alternatives. The left panel illustrates the 
above definition of the BH95 procedure, while the right panel gives an interpretation of this 
definition in terms of crossing point between the line y = x/a and the empirical distribution 
function of the p- values: 

f„i = sup{u e [0, 1], (G,„(u) > u/a} . 

Plug-in procedures. The BH95 procedure controls FDR at level noa in the above 
mixture model (Benjamini and Hochl}erg, 1995). Applying this procedure at level a/no 
would therefore achieve FDR = a exactly. However, as ttq is unknown, this is only an 
Oracle procedure. It is thus natural to try to estimate ttq using ttq < 1, and apply the 
BH95 procedure at a/no, yielding a larger number of significant hypotheses for the same 
target FDR level (Benjamini and Hochberg, 2000). These "plug- in" procedures therefore 
have the same geometric interpretation as the BH95 procedure (see Figure 1) in terms of 
crossing point, with a/ no instead of a, and their rejection threshold can be written as 

^ r r 1 , 

T,n = supju e [0, IJ, ^ < a| . 
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Figure 1: Illustrations of the BH95 procedure on a simulated example. Left: sorted p-values; 

each dot correspond to one of 100 tested hypotheses. Right: Empirical distribution 
function. 



Recalling that pFDR(t) = ^(^i the rejection threshold of the plug-in procedure associated 
with ttq can therefore be interpreted as the rightmost point for which the corresponding 
estimate of pFDR is upper bounded by a. 

We note that this idea is not specific to FDR control, as it can be (and in fact originally 
was) applied to the control of Family- wise error rate (FWER), that is, the probability of at 
least one false rejections: for example, the Bonferroni procedure at level a controls FWER 
at level ttoci, and corresponding plug-in procedures have been developed along similar 
lines (Hocliberg and Benjaniini, 1990). 



2.3 Criticality of the selection problem. 

Chi (2007a) noticed that depending on the distribution function G of the p-values, pFDR(i)t>o 
may be bounded away from 0, giving rise to a phenomenon that he called criticality: no 
selection procedure can achieve pFDR smaller than a* = inft>o pFDR(<). Importantly, a* 
is intrinsic to the selection problem in the sense that it only depends on the parameters of 
the mixture model: 

a = mf — — . 

t>o G{t) 

In particular, a* is defined without a reference to any multiple comparison procedure. Crit- 
icality reveals an interesting range of situations in which FDR and pFDR are not asymp- 
totically equivalent anymore (Chi and Tan, 2008): given a multiple comparison problem 
such that a* > 0, any procedure that controls FDR at level a < a* necessarily makes no 
rejection with positive probability: 

FDR a 

F{R = 0) = 1 — — > 1 > . 

^ ^ pFDR - a* 
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The actual value of P(i? = 0) depends on the FDR controlling procedure. Criticality 
has been investigated by Chi (2(JU7a) in the context of FDR and pFDR control by the BH95 
procedure. The critical value of the BH95 procedure is defined as 



'ue[o,i] G{u) ' 

Criticality results in a threshold effect in the asymptotic proportion of rejections by 
the BH95 procedure, which is summarized by Proposition 4. 

Proposition 4 (Criticality of the BH95 procedure (Chi, 2007a)) Let p„i{ct) = Rm{o-)lm he 
the fraction of rejections by the BH95 procedure. 

1. If a > a* , then Pm{cy) converges in probability as m ^ +oo to a positive value, which we 
denote by Pco{a). 

2. If a < a*, then /C,n(ck) converges to in probability as m +oo. 

Strictly speaking, a* is not intrinsic to the multiple comparison problem, as it is con- 
nected to FDR control by the BH95 procedure. However, we have a* = a*/TrQ, we have 
a* = if and only if a* = 0. Therefore, the fact that criticality occurs or not — for any 
procedure — can be characterized in terms of the critical value of the BH95 procedure by 
the fact that a* = or not. 



3. Criticality and distribution of the test statistics 

As noted by Chi and Tan (2008), criticality is intrinsic to a multiple testing problem: a* 
only depends on the characteristics of the model; in particular it does not depend on a 
multiple testing procedure. However, when a criticality phenomenon occurs, the actual 
lower bound a* on the target FDR level that ensures non-trivial FDR control by a given 
procedure does depends on the procedure. 

For simplicity, the results presented in this section are written and illustrated specifi- 
cally for the BH95 procedure. We begin by providing theoretical interpretation and illus- 
tration of criticality by studying how different families of distribution of the test statistics 
can lead to different behaviors in terms of criticality (Section 3.1). Then we discuss ex- 
pected and observed practical consequences of criticality by studying the power of the BH95 
procedure as a function of the target FDR level a (Section 3.2) in simulations and real data. 

We will discuss location problems, that is, problems in which the distribution of the 
test statistic under Hi is a shift from that of the test statistic under Hq: Fi ~ Fq{- — 6) for 
some location parameter 9 > 0. We will also investigate the case of Student test statistics, 
which is not a location problem but is widely used in real data analysis. 



3.1 Interpretation and illustration of criticality 

We begin by recalling a characterization of criticality for the BH95 procedure, in terms of 
the behavior of the density gi under the alternative at 0. Under Condition 1, G is concave 
and u !->■ is non-decreasing on [0, 1]. The critical value of the BH95 procedure is then 
given by 

a* =^ lim — ^ . (7) 
u^o G{u) 

Criticality therefore only depends on the behavior of at 0. This is not surprising if 
we go back to the interpretation of the BH95 procedure proposed in Figure 1 (right panel): 
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criticality corresponds to situations where a is so small that there is no positive crossing 
point between G and the line y = x/a. Combining (7) with I'Hopital's rule, a* may 
be written as lim„_>.o l/g{u), which is a function of the likelihood ratio ^ of the model. 

Therefore, criticality is governed by the behavior of ^(t) as t tends to +00. These results 
were established by Chi (2007a) and Chi and Tan (2008) for one-sided p-values, and are 
summarized by the following Proposition. 

Proposition 5 (Criticality and likelihood ratios (Chi (2007a); Chi and Tan (2008))) Under 
Conditions 1 and 2, we have: 

1. If ^ is hounded as t ^ +00, then the density gi of the p-values under the alternative has a 
has a finite limit at (which we denote by gi{0)). Criticality occurs, and the critical value is 
given by 

= I ; 

TTo + (1 - 7ro)gi(0) ' 

2. // limt_>-)-oo /^(O = +00, then lim„_5.o = +00, and a* = 0. There is no criticality, and 
all target FDR levels are attainable. 

Note that Proposition 5 holds for both one- and two-sided p-values. We now illustrate 
this property in location models (Section 3.1.1) and in the more complicated — but more 
realistic — situation of Student test statistics (Section 3.1.2) 

3.1.1 Illustration in location models 

In location models the behavior of the likelihood ratio fy- = is closely related to the 

tail behavior of the distribution of the test statistics: for a given non-centrality parameter 
6, the heavier tails, the smaller difference between /o(- — 0) and /q, and the larger critical 
value. The most well-known location problems are the Gaussian and Laplace (double 
exponential) location problems, which illustrate distinct behaviors in terms of criticality. 

Gaussian test statistics. Assume that the test statistics are distributed as Af{0, 1) 
under the null hypothesis, and as J\f{6, 1) under the alternative (with 6 0). The likelihood 
ratio is thus given by 

f«) = exp(-l(,-.f + i,") 




As this likelihood ratio is non decreasing and not bounded as t — > +00, Proposition 5 
implies that there is no criticality in the Gaussian location problem: a* — 0. Figure 2 
illustrates the absence of criticality for the Gaussian location problem: a"^ = whatever 
the values of 9 and ttq, as the distribution function has a vertical semi-tangent at the 
origin. We now investigate the case of Laplace (double exponential) test statistics, which 
has heavier tails than the Gaussian distribution; this results in a positive critical value. 

Laplace test statistics. Assume that the density of the test statistics is /q : 1 1-> ^e~''' 
under the null hypothesis, and fi : t ^ ^e"'*"^' under the alternative, with > Q. The 
corresponding distribution functions under Hi are derived in Appendix B for one-sided 
p-values (Proposition 25). The likelihood ratio of the model is given by ^{t) = el*'"'*"*', 
that is, ^(i) = e^*~^ iit <6, and e^ iit> 9. The likelihood ratio of this model is therefore 
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Figure 2: Non criticality for the Gaussian distribution, illustrated with 9 = 2 (left) and 
9 = 3 (right). Distribution functions of one-sided (solid) and two-sided (dashed) 
p-values, for ttq =0, 0.5 and 0.75. 



bounded, which results (by Proposition 5) in a positive critical value. By Proposition 25, 
we have a* = l/(7ro + (1 — 7ro)e'') for one-sided p-values, and a* = l/(7ro + (1 — ttq) cosh 6*) 
for two-sided p-values. Figure 3 illustrates criticality for the Laplace location problem. The 
distribution function of one-sided p-values under a Laplace mixture with proportion ttq of 
true nulls is linear between and with slope ttq -I- (1 — 7ro)e^ = 1/a* (see Appendix B, 
Proposition 25). Criticality occurs for any value of 6 and ttq (and would also occur for any 
FDR controlling procedure). However, the value of a* depends on both 9 and ttq (and on 
the procedure), as illustrated in Figure 4. 

Numerical example in the one-sided Laplace case. As a* is a decreasing function 
of and ttq , the knowledge of a lower bound on 9 and 1 — ttq can be translated into a lower 
bound on a*. For example, suppose that we know that 9 < 2, and ttq > 0.75 in the one- 
sided Laplace case. Then a* > q 754-0 25e^ ~ 0.385, which means that even though ttq and 
9 are not exactly known, we know that the BH95 procedure applied in this setting with 
any target FDR level a < 0.385 has asymptotically null power as the number of tested 
hypotheses grows to +00. In the case when ttq is totally unknown, for a given lower bound 
on 9, there is still a positive minimal a*, namely a*_ = e~^, which corresponds to the limit 
case when all hypotheses come from the alternative (that is, ttq = and G = Gi). This 
limit case is represented in red in Figure 3. For example, with 9 < 2, then q*_ = 0.135, 
whatever ttq. 

Subbotin test statistics. Gaussian and Laplace distributions can be viewed as in- 
stances of a more general class of distribution introduced by Subbotin (1923). Let us define 
the density of the test statistics by 
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Laplace, 6=2 Laplace, 6=3 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 



p-value p-value 



Figure 3: Criticality for the Laplace distribution, illustrated with 9 = 2 (left) and = 3 
(right), respectively. Distribution functions of one-sided (solid) and two-sided 
(dashed) p-values for ttq =0, 0.5 and 0.75. 



Laplace ^ two sided Laplace 




9 



Figure 4: Critical values a* for one-sided (left) and two-sided (right) Laplace distributions, 
as a function of and ttq. Solid black lines represent level curves for a few values 
of a* , which are also marked in the color scale. 



under 'Ho, and fi(t) — /q (i — 0) under 'Hi, where Cj is a normalizing constant that makes 
/q a density. The Gaussian case corresponds to 7 = 2 and the Laplace case to 7 = 1. The 
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likelihood ratio for this location problem is given by 

|t|T |t-6l|T 



JO 



exp 



exp 



7 
7 



1 



7 



1 



We 



focus on 7 > 1 because it corresponds to situations in which is non-decreasing. As 

Jo 



t -> 



£17 
t I 



1_2£, so that iii- (1- |1- 



n 



eV-^, and the behavior of krit) 

Jo 



is driven by the value of 7: if 7 = 1, lim4_j.-)_oo 7V (i) ~ e and there is a positive critical 

Jo 

value, as noted above. If 7 > 1 (for example in the Gaussian case), limt_j.-|_oo Trft) = +00 

JQ 

and there is no criticality. Laplace-distributed test statistics appear as a limit case in terms 
of criticality: within the family of Subbotin location models, there is no criticality if and 
only if the tails of the test statistics are lighter than exponential. 



3.1.2 Illustration for Student test statistics 

The study of location models in the preceding section provides insight into the connection 
between tail behavior of the test statistics and criticality. In practice however, test statistics 
are typically assumed to follow a Student distribution, because they have been generated 
from longitudinal observations that can be assumed to be Gaussian with unknown variance. 

Definition 6 (Student distribution) Let Zq he normally distributed with mean 9 and variance 1, 
and Y independently distributed as central with k degrees of freedom. Then the random variable 
Tk g = ^yi^ is said to follow a t distribution (Student distribution) with k degrees of freedom and 

non-centrality parameter 9. If 9 ~ 0, Tk^ is denoted by Tk and is said to follow a (central) t 
distribution with k degrees of freedom. 

Note the Student multiple testing problem is not a location model, as a non-central Student 
random variable is not a translation from a central Student random variable. As a practi- 
cal illustration for a Student multiple comparison problem, we will consider a microarray 
data set (Golub et al., 1999) which consists of the measured expression level of m = 3051 
genes in blood samples from 38 patients suffering from two types of leukemia: acute lym- 
phoblastic leukemia (ALL, 27 cases) and acute myeloid leukemia (AML, 11 cases). The 
goal of the original study was to find genes that are significantly over- or under-expressed 
in one class of patients with respect to the other class. We have used the data from the 
R package multtest available from Bioconductor (Gentleman et al., 2004). This data was 
preprocessed as described in Dudoit et al. (2002). For each gene, we performed a two-sided 
Student test of the null hypothesis that this gene is equally expressed in the two classes of 
patients. 

We assume that we are observing (Xi, ...Xnx) independent observations distributed as 
N'{iix,o''^) and (Yi,...y„^) independent observations distributed as M {^jly , cr"^) ■ We also 
assume that {Xi)Ki<nx and {Yi)i<:i<^nY are independent. We focus on the (two-sided) 
problem of testing Ho : Ma' = /^y against "Hi : 7^ /^y- 

Proposition 7 Letting X 
tic ofHo against Hi as 



X]i=i '^"■'^ Y uy — Ej=i define the Student test statis- 
Y — X 

^ riY Tlx 



Q / 1 I 1 

o 



\ I tlx ' fl-Y 
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where 

SxY = ^ -[Y,{X,-X^,f + Y,{Y,-Y 



is an estimator of the common standard deviation of the two samples. Under Hq: Tn^+UY follows a 
central t distribution with nx +nY — 2 degrees of freedom; under 'Hi, Tn^+ny follows a non-central 
t distribution with nx + ny — 2 degrees of freedom and non-centrality parameter 



Remark 8 (effect size) The non-centrality parameter may be written as 6 ~ S/^J:^ + where 
the effect size 

S=^^r_J^ (9) 
cr 

does not depend on the number nx andny of longitudinal observations in each group. 5 characterizes 
the distributions Af {fix, o''^) and Af {fiy , a^) of the observations. 

The following Proposition gives an expression of the likelihood ratio for Student test 
statistics. 

Proposition 9 (Likelihood ratio for Student test statistics) The likelihood ratio between a 
central Student distribution with k degrees of freedom and a non-central Student distribution with k 
degrees of freedom and non-centrality parameter may be written as 



— {t) = cxp 
Jo 



1 



1 + f 



Hhi 



HhkiO) 



where 



Hhk{z) = / ^e-^("+^) dx. 
Jo k\ 



Proposition 10 (Criticality — Student multiple comparison problem) Consider the Stu- 
dent multiple comparison problem where test statistics are distributed as central Student with k 
degrees of freedom under TLq and non-central Student with k degrees of freedom and non-centrality pa- 
rameter S under "Hi. The corresponding likelihood ratio is non- decreasing, and bounded as \t\ — > +cxd; 



therefore, there is a positive critical value, which is given by a* = (^ttq + (1 — ttq) ^^^^^q)'* ^ for one- 
sided p-values and by a* ~ ^ttq + (1 — ttq) ^''^2[mf(fi)'^^'^ ) f'^^ two-sided p-values. 

The fact that a* > is consistent with the fact that the Student distribution has 
polynomial tails, that is, heavier than for the Laplace location problem, in which criticality 
already occurred. This result is illustrated by Figure 5. The parameters of the Student 
distribution functions have been chosen as follows. First, the number of degrees of freedom 
in the right panel has been chosen to match the actual number of observations in the Golub 
data set: nx + ny — 2 = 27 + 11 — 2 = 36 degrees of freedom. The value of 6* = 2.5 has been 
chosen empirically to maximize the fit between observed (solid black line) and expected 
distributions of two-sided Student p-values: for ttq ~ 0.5 (dashed green line in the right 
panel), the fit is quite good. 
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As noted above (Remark 8), the effect size S ~ (fiy ~ ^^x)/<^ characterizes the distribu- 
tions of the longitudinal observations, regardless of the number of longitudinal observations. 
Using Equation (9), 5 can be estimated as (5 « 0.9 for this data set. The left panel illustrates 
the distribution functions for the same multiple comparison problem — as characterized by 
its effect size 5 and fraction of true null hypotheses ttq — with a smaller number of obser- 
vations: we have chosen nx = 8 and ny = 3, which corresponds to approximately 30% of 
the original number of observations. The parameters of the associated Student distribution 
are therefore n.^. + ??y — 2 = 9 degrees of freedom, and 6 w 1.3 for 5 = 0.9. 




Figure 5: Criticality for the Student distribution with 9 degrees of freedom and 9 = 1.3 (left), 
and (right) 36 degrees of freedom and = 2.5. The effect size is the same for both 
panels: 5 = 0.9. Distribution functions of one-sided (solid) and two-sided (dashed) 
p-values for ttq =0, 0.5 and 0. 75. Solid black line in the right panel is the observed 
distribution of two-sided p-values in the Golub data set. 



In the mixture model, we assumed that each gene is either non-differentially expressed: 
~ fJ-Y or differentially expressed with a common, positive effect size, S = {fix — . 
In practice, the distinction between differentially expressed and non differentially expressed 
genes is not as clear cut. We believe that the closeness between the Student distribution 
with 36 degrees of freedom and non-centrality parameter d = 2.5 to the distribution of the 
observed p-values (dashed green and solid black lines in the left panel of Figure 5) indicates 
that our model is relevant to real data analysis. 

The comparison between the left and the right panel of Figure 5 illustrates the influence 
of longitudinal sample size on criticality: although the effect size is the same in both panels 
((5 = 0.9), criticality is much more serious when sample sizes nx and ny is small, because 
both the number of degrees of freedom nx + ny — 2 and the non-centrality parameter 

9 — 6/ \J + ^ are increasing functions of nx and ny . 

The influence of longitudinal sample size on criticality has been studied by Chi (2007b). 
Although the supremum of the likelihood ratio of the Student multiple comparison problem 
is bounded for a given number of degrees of freedom k and effect size 5 by Proposition 9, 
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this likelihood ratio grows to +00 whenever (5—5-0 and k — > +00, provided that kS — > 
+00 (Chi, 20071), Lemma 2.1.3). This implies that criticality is canceled out by the supply 
of longitudinal observations. The influence of longitudinal sample size on power in real 
data analyses will be discussed in the next section. 

3.2 Criticality and power of the BH95 procedure 

In the preceding section we have seen that different families of distribution of the test 
statistics can lead to different behaviors in terms of criticality, and that when criticality 
occurs the critical value a* depends on the parameters of this distribution. However it is 
still unclear how FDR controlling procedures behave in practice depending on a and a* , for 
two main reasons. First, criticality is an asymptotic notion: it characterizes the asymptotic 
behavior of the proportion of rejected hypotheses as the number of hypotheses grows to 
infinity (see Proposition 4). Second, criticality only gives a binary interpretation of the 
situation: either criticality occurs or does not occur, and we have little indication on how 
serious criticality is in a given setting. 

In this section, we studying the power of the BH95 procedure as a function of the 
target FDR level a, in order to gain insight into the practical consequences of criticality. 
The power of a FDR controlling procedure at level a can be defined as the proportion of 
true positives among rejections: 



Rr,i{a) - Vmia) 



m — mo 

The corresponding asymptotic power 1100(0;) is defined as the limit in probability of Ilmia) 
as m — +00. Proposition 11 demonstrates that criticality results in a thresholding effect 
in the asymptotic power achieved by the BH95 procedure, similarly to what we observed 
in Proposition 4. 

Proposition 11 (Asymptotic power of the BH95 procedure, Chi 2007a) Let a* be the crit- 
ical value of the BH95 procedure. 

1. If a < a*, IIoo = 0; 

2. If a > a*, then 

1 - TToa 
IIoo = Poo (a) . 

1 - TTq 

Proposition 11 motivates the following two questions: 

• when a < a* , the BH95 procedure has asymptotically null power; how does the power of the 
BH95 procedure behave for a finite number of hypotheses ? 

• when a > a* , the BH95 procedure has asymptotically positive power; how large is this power, 
both asymptotically and for a finite number of hypotheses ? 

In order to address these questions, we compare the expected and observed power of 
the BH95 procedure for different location models using a simulation study (Section 3.2.1). 
Then we illustrate the influence of longitudinal sample size on power and criticality in a real 
microarray data set, and discuss the connection with asymptotic results (Section 3.2.2). 

3.2.1 Observed and expected power of the BH95 procedure in location 

MODELS 

Figure 6 displays power as a function of a in the same settings as Figure 2 and Figure 3: 
Gaussian (top) and Laplace (bottom) distributions, with non-centrality parameter 6 = 1 
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(left) or 2 (right), with ttq G {0,0.75,0.9}. Asymptotic power (IIoo) is represented by sohd 
hnes. This figure also summarizes the results of a simulation study that we performed in 
order to compare IIoo with the power 11^ achieved for a finite number of observations. For 
each simulation unit, m = 1000 p-values were generated according to the above settings. 
Dashed and dash-dotted lines correspond to the median, 5% and 95% quantiles of n„i over 
B = 1000 repetitions the simulation study. For the Laplace problem, where criticality 
occurs, critical regions for each value of ttq are identified by colored rectangles. 

The asymptotic power curves (dashed lines) of the Laplace distributions in Figure 6 
illustrate the singularity in the power function at a = a* characterized by Proposition 11: 
the asymptotic power is identically for a < a* and positive for a > a*, with a vertical 
semi-tangent at a = a*. As there is no criticality for the Gaussian distribution (a* = 0), 
there is no such singularity for this distribution, and the asymptotic power is a smooth 
function of the target level a. 

The curves 11™ and Hoo are quite similar for a > a* , while 11™ tends to be slightly 
larger than Hoo for a < a*: the singularity at a = a* is smoothed in the observed power 
function. This suggests that the effect of criticality in real data analysis is less dichotomous 
than suggested by the mere distinction of a < a"^ versus a > a* . This conclusion is 
reinforced by the comparison of power functions across parameters within a given family of 
distributions. For the Gaussian distribution when the non-centrality parameter 9 is small 
(top left panel), no or very few rejections are made for small values of a, even though 
there is no criticality {a* = 0). For any positive target level a, the associated power will 
eventually be positive for m large enough, but it can still be quite small if 6 is small and 
ttq is large. For the Laplace distribution (bottom row), the observed (finite sample) power 
can be positive even for a < a*, although it is generally quite small. 

3.2.2 Longitudinal sample size and criticality: theory and practice for the 
Student distribution 

In order to study how the results presented in the preceding section translate in real data 
analysis, we go back to the analysis of the Golub et al. (1999) microarray data set described 
in Section 3.1.2. As we do not know which genes are truly differentially expressed in this 
study, we cannot calculate the power 11^(0;) = {Rm{o) — Vmio))/ [m — mo) at threshold a 
as we did in the preceding section. Indeed, both the total number of true null hypotheses 
mo and the number of Vm{a) of false positives at threshold a are unobserved. Therefore, 
we focus on the fraction Pm.{o) of hypotheses rejected by the BH95 procedure at threshold 
a: 

p„i{a) = R,n{a)/m. 

We emphasize that asymptotic power Hoc and asymptotic fraction of rejected hypotheses 
Poo are expected to have the same type of behavior: both are null for a < a* , and they are 
connected by Hoo = ^i-^^' Poo (o^) when a > a* (Propositions 4 and 11). 

Figure 7 compares Poo{ct) (dashed curves) for Student test statistics to the observed 
fraction Pm(a) of genes declared differentially expressed by the BH95 procedure at level a in 
the Golub et al. (1999) data set (solid curves). Red curves correspond to the entire data set 
(38 samples), and green and blue curves correspond to subsets of 60 and 30% of the original 
data set, respectively. For each sampling rate s, 100 resamplings of the original data set 
were performed as follows: [s • nx\ and \_s ■ nyj samples were chosen randomly among ALL 
and AML samples, respectively, and the BH95 procedure was applied to Student two-sided 
p- values of differential expression between the two groups. 

For Poo (dashed lines), the parameters of the Student multiple comparison problem 
were chosen as described in the preceding section (p. 14): the signal no noise ratio was 
set to i5 = 0.9 and the proportion of true null hypotheses to ttq = 0.5. These parameters 
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Gaussian distribution, n=1 Gaussian distribution, n=2 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 




Figure 6: Power 11^ achieved by the BH95 procedure as a function of target FDR level a, 
for different flavors of one-sided Gaussian and Laplace problems: 9 = 1 (left) 
and 9 = 2 (right). Top: Gaussian distribution (no criticality); bottom: Laplace 
distribution (criticality). Solid lines represent noo(a), the limit in probability of 
Ilm{ct) as m ^ +00. Dashed and dash-dotted lines correspond to the median, 
5% and 95% quantiles ofIlfn{ct) over B = 1000 simulations. Golored regions in 
the bottom plot illustrate critical regions for each value of ttq; in areas where the 
critical regions overlap, the color corresponding to the smallest a* has been used. 



were kept constant across sampling rates, and the number of degrees of freedom and non 
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0.0 0.2 0.4 0.6 0.8 1.0 



a 

Figure 7: The fraction of hypotheses rejected by the BH95 procedure as a function of target 
FDR level a illustrates the influence of longitudinal sample size on power and 
criticality on the Goluh data set. Solid lines correspond to random samplings of 
the original data set, each color corresponding to a different sampling rate: full 
data set (red, 38 samples), sampling rate 0.6 (green, 23 samples), sampling rate 
0.3 (blue, 11 samples). For sampling rates 0.3 and 0.6, 100 random resamplings of 
the original data set have been performed. Dashed lines represent the asymptotic 
fraction of hypotheses rejected assuming a mixture model of Student distributions 
(see main text for details). 



centrality parameters were adjusted accordingly for each sampling fraction, as described in 
Table 1. 



Sampling rate 




100% 


60% 


30% 


ALL sample size 


nx 


27 


16 


7 


AML sample size 


ny 


11 


7 


3 


Degrees of freedom 


k = nx + ny 


36 


21 


8 


non centrality parameter 


9 = 6/y/l/nx + 1/ny 


2.5 


2 


1.3 



Table 1: Parameters used for the resampling study in Figure 7. d was set to 0.9. 
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The solid red curve in Figure 7 corresponds to the observed fraction of rejections in 
the Golub data set, while the dashed red curve corresponds to the asymptotic fraction of 
rejected hypotheses of a Student multiple comparison problem with S = 0.9, ttq = 0.5, 
nx = 27 and ny = 11. We interpret the closeness of these two curves as the combination 
of two elements. First, the parameters S and ttq adequately describe the data set. This is 
consistent with the fact that 6 and ttq were chosen to maximize the fit between the expected 
and observed distribution function of the two-sided Student p- values (Figure 5, right panel). 
Second, the asymptotic fraction of rejected hypotheses is a good approximation of the 
observed fraction of rejected hypotheses. 

There is non-negligible variability in the distribution of observed fractions of rejected 
hypotheses in each of the two sampling scenarios (solid green and blue curves) . The asymp- 
totic fraction of rejections (dashed curves) are consistent with the observed fractions (solid 
curves), although for a sampling rate of 60% (green) the asymptotic fraction seems to 
underestimate the observed fractions for a small target FDR. 

4. Estimation of ttq 

In this section we focus on the estimation of the fraction ttq of true null hypotheses in 
the settings described in Section 2.1, where the density gi of the p- values under the al- 
ternative hypothesis is assumed to be decreasing (Condition 1). As gi is unknown, a 
natural approach to estimate ttq from observed (one- or two-sided) p-values drawn from 
a mixture with density 5 = ttq + (1 — 7ro)<7i is to focus on p-values near 1. In this sec- 
tion, ttq denotes a generic estimator of ttq based on this idea. A number of such estima- 
tors have been studied in this context (Efron et al., 2001; Genovese and Wasserman, 2004; 
Meinshausen and Biihlmann, 2005; Meinshausen and Rice, 2006; Schweder and Spjotvoll, 
1982; Langaas et al., 2005; Storey et al., 2004; Benjamini et al., 2006). 

The problem of estimating ttq is not only of interest in itself, it is also motivated by 
power consideration in multiple testing problems: for example, using the plug-in procedure 
BH95(a/7To), where ttq is an estimator of ttq, yields tighter FDR control than the stan- 
dard BH95 procedure (Benjamini and Hochbcrg, 2000). The goals of this section are to 
understand what drives the regularity properties of these estimators of ttq in our setting, 
and investigate the consequences of these regularity properties in terms of FDR controlling 
capabilities of associated plug-in BH95 procedures. 

We begin by pointing out a connexion between criticality in one-sided, symmetric 
location models and a necessary condition to achieve consistency of ttq (Section 4.1). Then 
we show how convergence rates of ttq are connected to regularity properties of gi near 1 
(Section 4.2). Finally, we prove that the convergence rate of the False Discovery Proportion 
(FDP) achieved by plug-in procedures of the form BH95(Q;/7ro) are determined by the 
convergence rate of ttq (Section 4.3) and conclude by studying convergence rates in Gaussian 
and Laplace location models (Section 4.4). 

4.1 Consistency, purity and criticality 

Since we are focusing on estimators of g(l) = ttq + (1 — 7ro)5i(l); a necessary condition for 
such an estimator to be consistent for ttq is 

5i(l) = 0. 

This is the purity condition introduced by Genovese and Wasserman (2004). Criticality is 
related to the behavior of gi at 0, and purity is related to the behavior of at 1. We begin 
by identifying a connection between purity and criticality in one-sided symmetric location 
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models, where we have fi{x) — f^ix ~ 9) (location model) and fo{—x) ~ fo{x) (symmetry, 
i.e. Condition 2). 

Lemma 12 (Likelihood ratios in symmetric location models) Consider the multiple location 
problem in which test statistics have densities fo under Hq, and fi = fo{- — 0) under Hi for some 
9^0. Under Condition 2 (symmetry) , we have 

hm — — - = hm — — - . 

Proposition 13 (Purity and criticality in one-sided symmetric location models) Let be 

the density of one-sided p-values under the alternative hypothesis, and a* the critical value of the 
BH95 procedure for the corresponding multiple comparison problem. Under Conditions 1 and 2, 

1. Non criticality and purity are equivalent: a* = if and only if g^ (1) = 1; 

2. // lim3,_>+oo 1^ is finite, then a* = [ttq + (1 - Tro)g^ {0)) ^ and g+{l) = ttq + {1 - Tro)gl (l) 
are connected by gf {0)gi {1} = 1. 

Going back to the examples of Section 3, as there is no criticality in the Gaussian case, 
the purity condition is verified and ttq can be consistently estimated using an estimator of 
g^{l). In the Laplace case, there is a positive critical value, given by 



TTo + (1 - TTQ)e<^ 

for one-sided p-values, and ttq cannot be consistently estimated based on the behavior of 
the p-values at 1 because g+(l) = 7ro + (l — 7ro)e~^ > ttq. The situation is markedly different 
for two-sided p-values, as gi{l) is not determined by the behavior of the likelihood ratio at 
-|-oo but at 0: 

Proposition 14 (Two-sided symmetric multiple testing problems are generally impure) 

Let gf be the density of the two-sided p-values under the alternative hypothesis, and a* the critical 
value of this multiple comparison problem. Under Condition 2 (symmetry), we have: 

5?(l) = f (0). 
Jo 

Proposition 14 directly follows from equation (6), combined with the fact that i^Q~^(l/2) = 
in symmetric models. As a consequence, criticality and purity are not equivalent for two- 
sided p-values. For example, the two-sided Gaussian location problem is always impure: 
g'^il) = but has no criticality: lim„_i.o l/gi{u) = 0. 

4.2 Asymptotic properties of non-parametric estimators of ttq 

In this section we consider non-parametric estimators of ttq based on the distribution of 
the p-values near 1, and show how asymptotic properties of such estimators are driven by 
the regularity of g near 1. As discussed in Section 4.1, such estimators may or may not 
achieve consistency, depending on whether the purity condition ,9i(l) = is met. We let 
ttq = g{l), that is, 

TTo = Ti'o + (1 - 7ro)gi(l) . 
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4.2.1 Non-parametric estimators of ttq with known convergence rates 

To the best of our knowledge, the only non-parametric estimators of ttq for which con- 
vergence rates have been established in our setting are those proposed by Storey (2002), 
Swan(>pocl (19<)9) and Hcngartncr and Stark (1995). The use of these estimators in the con- 
text of multiple testing problems was discussed by Genovese and Wasserman (2004). We 
briefly review the asymptotic properties of these estimators stated in Genovese and Wasserman 
(2004). 

Storey's estimator. Adapting a method originally proposed by Schwcdcr and SpjotvoU 

(1982), Storey (2002) defined 7ro(A) = ^-^^^^ for < A < 1. As a smooth functional of 
the empirical distribution of the p-values, this estimator has the following asymptotic dis- 
tribution, provided that G{X) < 1 (Genovese and Wasserman, 2004): 

^f. 1-G(A)\ G(A)(1-G(A))\ 

This estimator converges at the parametric rate 1 / ^/m, and it is asymptotically biased 
(even if the purity condition is met) because > ttq for A < 1. 

Confidence envelopes for the density. Hcngartncr and Stark (1995) derived a 
finite sample confidence envelope for a monotone density. Assuming that G is concave and 
that g is Lipschitz in a neighborhood of 1, the resulting estimator, which we denote by ttJ^^, 
converges to ttq at rate (Inm)^^"^ m~^/"^. 

Spacings-based estimator. Swanepoel (1999) proposed a two-step estimator of the 
minimum of an unknown density based on the distribution of the spacings between ob- 
servations: first, the location of the minimum is estimated, and then the density at this 
point is itself estimated. Assuming that at the value at which the density g achieves its 
minimum, g and g are null, and g is bounded away from and -f oo and Lipschitz, then for 
any 6 > 0, there exists an estimator converging at rate (In m)*m^^/^ to the true minimum. 

In our setting, the Lipschitz condition on g is unnecessary: the minimum of g is nec- 
essarily achieved at 1 because g is non-increasing (under Condition 1), so the first step of 
the estimation may be omitted. The corresponding estimator is denoted by ttq™. 



4.2.2 Asymptotic properties and regularity near 1 

We will show in this section that the differences in the asymptotic properties of these 
estimators of ttq in our context are in fact driven by the differences in the regularity as- 
sumptions that were made, rather than by the specific form of the estimators. As these 
estimators are essentially estimators of (?(1), their asymptotic properties are driven by the 
regularity of g near 1. 

Storey's estimator is asymptotically biased (even if the purity condition is met) because 
^^-A^^ > ^ fo'" A < 1. In order to make this estimator consistent for the estimation of tTq, 
we let /i = 1 — A go to as the number m of tested hypotheses goes to -|-oo. The asymptotic 
bias and variance of the corresponding estimator are derived in Proposition 15: 

Proposition 15 (Asymptotic bias and variance of 7ro(l — h,n)) Let 

-o(A) = 

for < A < 1. Let hm be a positive sequence such that km — > 0. 
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1. if mhm — > +00 as m +00, then 

y^mhjn (71-0(1 - h,n) - E [7ro(l - /i,„)]) 7V(0,7f^) . 

2. Assume that gi is k times differentiable at 1, with gi\l) = for 1 < I < k and ^['^•'(1) 7^ 0. 
Then 

E[7ro(l-/.™)]-7f^ ~ (l"^o) ^ ' hi. 

rn->+oo (fc + Ij! 

Proposition 15 shows that the bias of Storey's original estimator, — can be 

canceled out asymptotically by letting km = 1 — A — > 0, at the price of a reduction of 
the convergence rate: the bandwidth km balances bias and variance of Storey's estimator. 
Moreover, if the purity condition is met, then we have ttq = ttq and 7ro(l^^m) is a consistent 
estimator of ttq. 

Note that only the bias term in Proposition 15 depends on the regularity of the distri- 
bution: the asymptotic bias is of order /i^, while the asymptotic variance of tto{1 — km) is 
of order {mhm)~'^ , regardless of the regularity of the distribution. A natural way to resolve 
this bias/variance trade-off when the regularity of the distribution is known is to calibrate 
/i„i such that the Mean Squared Error (MSE) of the corresponding estimator is minimum. 

Proposition 16 (Optimal bandwidth — Storey's estimator) Assume that gi is k times dif- 
ferentiable at I, with gi \l) = for 1 <l < k, and g\^\l) ^ 0, for some positive integer k. 

1. The optimal bandwidth in terms of MSE is of order h*^{k) = ^t+i ^ and the corresponding 

_ 2 k 

MSE is of order m ^fc+i . 

2. Taking hm(k) = h'^(k)rl^, where rjm ^0 as m ^ +00, we have 

m^r]m (Tro(l - h^{k)) - 7f^) 7V(0,7f^) . 

As a consequence, if we allow the parameter A of Storey's estimator to go to 1 as 
m — >■ -l-oo, the resulting estimator essentially achieves the same convergence rates as tTq^ 
and TT^^: 

tt}^^: Assume that gi is differentiable at 1. This is a slightly stronger assumption than made 
by Heiigartner and Stark (1995). Then Proposition 16 with k = 1 ensures that the con- 
vergence rate of Storey's estimator with bandwidth m"^/'^/??^;, where r]m. = {\nm)~^^^ goes 
to as m — > +00, is (Inm)^^"^ m^^^'^. This is the convergence rate of Hengartner and Stark's 
estimator. 

ttq^: Assume that gi is twice differentiable at 1. Then Proposition 16 with k = 2 ensures that 
the convergence rate of Storey's estimator with bandwidth m~^^^r]'^, where rjm = (Inm)"'' 
goes to as TO -l-oo is (lnm)^m~^^^ for any fixed S > 0. This is the convergence rate of 
Swanepoel's estimator. 

4.2.3 Kernel estimators 

The examples developed so far in this section illustrate the fact that the convergence rates 
of ttq are determined by the regularity of g near 1 . The convergence rates we obtained are 
typical convergence rates for non-parametric estimators. We prove that the same type of 
result holds for a broad class of kernel estimators of g{l). 
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Definition 17 (Kernel of order £) A kernel of order £ ^ N is a function iiT : R — > M such that 
the functions u H- K(u) are integrable for any j ^ . . .1, and verify J^K ^ 1, and K(u) — 
for j = l...l. 

Definition 18 (Kernel estimator of a density) The kernel estimator of a density g at po based 
on m independent, identically distributed observations Pi,. . . Pm from g is defined by 



where h > is called the bandwidth of the estimator, and K is a kernel. 

The estimator proposed by Storey (2002) is a kernel estimator with asymmetric rect- 
angular kernel of order 0, and bandwidth 1 — A. We generalize the results obtained so far 
in Section 4.2 to more general kernels. Tsybakov (2009) established lower bounds on the 
convergence rate of kernel estimators of g{l), depending on the regularity of g at 1. If g is 
k times differentiable at 1, with ^(^''^(l) 7^ 0, considering a kernel estimator ^(1) associated 
with a fc"^ order kernel and fixed bandwidth h, the asymptotic variance of g(l) is of order 
and the asymptotic bias of g(l) is of order h'^. 

As for the special case of Storey's estimator, optimal convergence results for kernel 
estimators may be obtained by letting /i to go to as m — >■ +00 and balancing asymptotic 
bias and variance in order to minimize MSE. 

Proposition 19 (Optimal bandwidth — A:"^ order kernel estimator Tsybakov (2009)) Assume 
that g is k times differentiable at 1, with ^''^^(l) ^ 0. Let g{l) be a kernel estimator with bandwidth 
hjn, associated with a fc"' order kernel. 

1. The optimal bandwidth for g{l) in terms of MSE is of order h'^ik) — 2fc+i ^ and the 
corresponding MSE is of order m ^fc+i . 

2. Taking hm(k) = h'^(k)r]'^, where rjm — > as m ^ +00, we have 



As a consequence, the convergence rate of the optimal kernel estimator of g{l) directly 
depends on the regularity fc of g at 1. Using this class of kernel estimators, we obtain 
the same convergence rates as in the special cases of Storey's, Hengartner and Stark's, 
or Swanepoel's estimators, under essentially the same regularity conditions, that is, fc*^ 
order differentiability of the distribution 171 of the p- values under the alternative. The only 
difference is that the assumption that the k first derivatives of gi are null at 1 for Storey's 
estimator is not needed for the kernel estimators used here, as they are fc"^ order kernels. 

4.3 Convergence rate of plug-in procedures 

We illustrate a connection between the selection and the estimation problem, which can 
be viewed as a motivation for the estimation of ttq in multiple testing problems. As noted 
in Section 2.2, the BH95 procedure at level a controls FDR at level TTga; this triggered the 
development of plug-in versions of this procedure (Benjamini and Hoclibcrg, 2000), which 
estimate ttq by ttq < 1, and apply the BH95 procedure at ch/tto, yielding a larger number 
of significant hypotheses for the same target FDR level. We elaborate on this connection 
between selection and estimation by showing that the convergence rate of a given estimator 




TO^?7rn(5(l)-ff(l)) ^AA(0,.9(1)). 
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ttq of ttq determines the asymptotic FDR controlling capabilities of the corresponding plug-in 
procedure. 

The False Discovery Proportion (FDP) achieved by a broad class of FDR controlling 
procedures including plug-in procedures based on the BH95 procedure has been studied by 
Neuvial (2008), for estimators of ttq that depend on the observations only through the em- 
pirical distribution function of the p- values (Storey, 2002; Storey ct al., 2004; Benjamini et al., 
2006). As a consequence, their convergence rate is ly/rn and the FDP achieved by the 
corresponding plug-in procedures also converges at rate 1/y/rn to their asymptotic FDR 
in the sub-critical case (Neuvial, 2008). However, these estimators are not consistent for 
ttq = 5(1); as they are essentially estimators of — [^^^^, at some uq < 1 (for example, uq = A 
for Storey's estimator). Conversely, the estimators studied in Section 4.2 achieve consis- 
tent estimation of Wo, at the price of a slower convergence rate. Here we show how these 
results translate in terms of asymptotic FDR control: the plug-in procedure BH95(a/7To) 
asymptotically controls FDR at level TToa/lfQ, and the convergence rate of its FDP is the 
same as that of ttq. 

We use the same notation as in Neuvial (2008) and refer to this paper for more detailed 
explanation. In particular, we assume without loss of generality that ttq does not depend on 
m, although strictly speaking we should write ttq = mo(m)/TO, where the number mo (to) 
of true null hypotheses depends on the total number m of hypotheses tested, and ttq is only 
the limit of mo{m)/m as m — >■ -t-00. We study the BH95 procedure at level a/iTQ, where ttq 
is an estimator of ttq that converges to ttq at rate V mhm , where h„i — > 0. This procedure 
rejects all hypotheses with p- values smaller than 

r = sup{te [0,l],G,„(t) >7rot/a} . 

The associated proportion of rejections and proportion of incorrect rejections are given by 
p — <GmiT) = TTTo/a, and P = 7roGo,m(T), respectively, where Go,m denotes the empirical 
distribution function of p-values that correspond to true null hypotheses. We define the 
corresponding asymptotic threshold as the threshold of the BH95(Q!/7fo) procedure: 

r* = sup {t <E [0, 1], G(i) > Tf^t/a} . 

Note that by the definition of r and t*, we have Gmir) = ttot/q; and G{t*) = tt^t* /a. 
The following Proposition shows that the convergence rate of (r, p) is driven by the 
convergence rate of ttq . 



Proposition 20 Let a* be the critical value of the BH95 procedure. Let a > Tfoa*, and ttq be 
an estimator of ttq with asymptotic distribution given by \/mhm (tto — tU)) ^ A/" (0, 1^(77^)) for some 
function v, where hm ~ o (1/ lnln77i). Then, as m ^ +00, 





( 











T* I a 



g{T*) - no/a 



TTo I (^0 - 7ro)(l + Op (1)) . 
v5(t*), 



Note that Woa* is the critical value of the BH95(a/7To) as long as ttq converges in 
probability to ttq. Therefore the condition a > noa* simply ensures that we are not in 
a critical situation for the BH95(a/7ro) procedure. Combined with the fact that the FDP 
achieved by a multiple testing procedure is a smooth function of the proportion of rejections 
p and the proportion of incorrect rejections 9, Theorem 21 implies that the convergence 
rate of ttq determines the convergence rate of the FDP of the associated plug-in procedure: 
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Theorem 21 (Asymptotic FDP for plug-in procedures) Let ttq be an estimator of ttq with 
asymptotic distribution given by 

\/mhm (ttq - Tfjj) 7V(0, v{t^)) 

for some function v, with km = o(l/liilnm). Consider the plug-in procedure based on ttq, which 
applies the BH95 procedure at level a/TTo- Let a* be the critical value of the standard BH95(a) 
procedure. Under Condition 1 (concavity of G), for any a > T^a* , the asymptotic distribution of 
the FDP achieved by the BH95(a/7ro) procedure is given by 



As a consequence, the FDP of plug-in BH95 procedures associated with the estimators 
of TTo studied in Section 4.2 can be derived by combining the results of Proposition 16, 
Proposition 19 and Theorem 21. 



Corollary 22 Assume that Condition 1 (concavity) holds and that gi is k times differentiable at 1 
with g''i\l) 7^ 0. Let hm{k) — m^^^ r]^, where rim — as m ^ +oo. Further assume that we are 
in one of the following two situations: 

^ TTO = ''^"i^j,")'"^'" , and g['\l) ^Oforl<l<k; 

2. ttq is a kernel estimator of g associated with a fc*'^ order kernel with bandwidth hm{k). 

Let a* be the critical value of the BH95 procedure. Then, for any a > ttqo* , the asymptotic distri- 
bution of the FDP achieved by the BH95(a/7ro) procedure is given by 

m-^^„(^FDP-^)-AA(^0,^) . 

In particular, if the purity condition is met, the asymptotic FDP achieved by the esti- 
mators in Corollary 22 is exactly a (and the asymptotic variance is /ttq). 



A A Regularity of 171 at 1 

We conclude this section by studying convergence rates in one- and two-sided Gaussian 
and Laplace multiple comparison problems. 



4.4.1 Two-sided problems 

Proposition 23 (Behavior of gi at 1 in two-sided symmetric models) Under Condition 2, 
we have : 

1. 

g±(l) = f (0). 
Jo 

2. If the likelihood ratio is differentiable in a neighborhood of 0, the density gf of the two-sided 
p-values under the alternative hypothesis verifies 
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3. If is twice differentiable in a neighborhood of 0, then we have 

For illustration we apply this result to Storey's estimator in Gaussian two-sided location 
models, where the likelihood ratio is given by ^(t) = cxp (—0^/2 + Ot). By Proposition 23, 

we have ^^''(1) = er^"/'^, gf'''^\l) = 0, and gf''^\l) > 0. Therefore, we can use Corollary 22 
with fc = 2 to derive the following result: 

Proposition 24 (Two-sided Gaussian test statistics) Assume that test statistics are distributed 
as Af{0, 1) under the null hypothesis and as M{9, 1) under the alternative. Let ttq be Storey's esti- 
mator with bandwidth /i,„ — mT^/^r]^, where rj^ — as m +oo, that is, 

1 - G,„(l - hm) 
TTo = — 



Then, letting tTq = ttq + (1 — 7ro)e^® we have: 
1. 

m^'^llrn (tTQ - 7f^) ^ a/" (0, Tl^) . 

2. Let a* be the critical value of the BH95 procedure. For any a > tt^o* , the asymptotic distri- 
bution of the FDP achieved by the BH95(a/7ro) procedure is given by 



Note that Proposition 23 cannot be applied to two-sided Laplace statistics as the like- 
)od ratio ^ [t] = exp 
;ribution of the two-s 
with Proposition 25 (4): 



lihood ratio ^(i) = exp {\t — 9\~ \t\) has a singularity at t = 0; in this particular case the 
distribution of the two-sided p-values can be calculated directly by combining Corollary 3 



Therefore, we have g^'"'^\l) ^ 0, and the optimal bandwidth in Corollary 22 is of order 
m~'^/^ instead of m~^/^. 



4.4.2 One-sided problems 

For one-sided p-values in symmetric models, we have proved that using the class of esti- 
mators of ttq studied here, consistency can be achieved if and only if there is no criticality 
(Proposition 13). For Laplace test statistics, criticality occurs, and the distribution of one- 
sided p- values satisfies G'l{u) = e^^u for u > 1/2. Therefore, for u > 1/2, "'""^'^ J"-* is 
constant, equal to ttq = ttq + (1 — 7ro)e~^, as illustrated by the solid curves in Figure 3. 
Therefore, Storey's estimator with any A > 1/2 is an unbiased estimator of ttq, which 
converges to ttq at rate Xj •^/m. 

In the Gaussian case, there is no criticality, so ttq can be consistently estimated from 
one-sided p-values, but the regularity of g^ at 1 is poor: we have 

ff+(x)=cxp('-y-0$-i(a;)') , 
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where $(= Fq) denotes the standard Gaussian distribution function. As /i ^ 0, $ ^(1 — 
h) < ^2 ln(l//i), so that 



3^(1 - h) > cxp i^--- e^2\nil/h)j . 

This imphes that is not difTerentiable at 1, meaning that consistent estimators of ttq 
based on the p-values close to 1 have convergence rates slower than m~'^/^ in our set- 
ting. This difference of behavior between the one- and two-sided Gaussian multiple testing 
problems is illustrated by Figure 8 for the simplest location model: A/'(0, 1) against Af{l, 1). 



N(0,1)vsN(1,1) 




0.0 0.2 0.4 0.6 0.8 1.0 



Figure 8: Density of one- and two-sided p-values under the alternative hypothesis for the 
location model AA(0, 1) versus AA(1,1). Inside plot: zoom in the region [0.9,1], 
which is highlighted by a black box in the main plot. 



The density of two-sided p-values has a positive limit at 1, and its derivative at 1 is 
0, making it possible to estimate the ttq + (1 — 7ro)e^^ at rate m^^/^. Conversely, the 
density of one-sided p-values goes to at 1, but is not differentiable: so that the true ttq 
can be estimated but the convergence rate is much slower. 
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4.4.3 Practical implications 

In practice, when gi{l) > 0, its value depends on the settings of the multiple comparison 
problem: for example, in the Student two-sided problem, we have gf{l) ~ j^i^) ^ ^-^^ 
by Proposition 9, that is, with notation of Section 3, 

5^(1) = exp (~— ^— — \ . 

\ l/nx + l/nYj 

In particular, (1) considerably depends on the longitudinal sample size nx +?t-y- For the 
Golub data set, we have nx = 27, ny = 11, and we estimated = 2.5, corresponding to 
an effect size 5 = 0.9. These parameters yield (1) ~ 0.04, but if we consider a situation 
where the sample size is twice bigger {nx = 54, ny = 22), we get gf{l) « 0.002 for the 
same effect size, and the bias ttq — ttq is negligible in practice. These remarks suggest that 
it would be interesting to conduct a study of the bias/variance trade-off in the estimation 
of TTo that would explicitly take longitudinal sample size into account. 

Recent work suggest two alternative research directions for estimating ttq: 

• one-stage adaptive procedures as proposed by Blanchard and Roquain (to appear) and Finner et al. 
(2009) allow more powerful FDR control than the standard BH95 procedure without explicitly 
incorporating an estimate of ttq: they are not plug-in procedures. 

• Jin (2008) proposed an estimator of ttq based on the Fourier transform of the empirical function 
of the p- values, which does not focus on the behavior of the density near 1, and might not 
suffer from the same limitations as the estimators studied here. In particular this estimator 
was shown to be consistent for the estimation of ttq when the p-values (not the test statistics) 
follow a Gaussian location mixture. 

Appendix: proofs 
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Appendix A. Proofs of section 2 

Proof [Proof of Proposition 1] Let x e R. We have p+{x) = ¥-Ho [X > x) ^ I - Foix) by 
the definition of Fq. Then, for any u G [0, 1], we have 

G+iu) = Pn,{p+{X)<u) 
= F«i {X > qo{u)) , 

where qo : u ^o^i^ ~ Therefore Gi{u) = 1 — Fi{qQ{u)), which proves Equation (1). 
Equation (2) then foUows from the fact that q'Q{u) = — l//o(go(w)). ■ 

Proof [Proof of Proposition 2] Let a: G M. We have 

= Fuo{\X\>\x\) 

= FHoiX>\x\)+¥uoiX<~\x\) 

= 1 - Fo{\x\) + Fo{-\x\) 

= 2(l-Fo(|x|)), 

as Fq is assumed to be symmetric. Then, for any u G [0, 1], we have 

G±(u) = (p±(X) <«) 

= ¥nAFoi\X\) > I - u/2) . 

= P«i {X > qo{u/2)) + {X < -qo{l - u/2)) , 

where qo : u F^\^ - Therefore (w) 1 - Fi{qo{u/2)) + Fi{qo{l - u/2)), which 
proves Equation (3). Equation (4) then foUows from the fact that (7o(") = foiloi^))- 



Appendix B. Proofs of section 3 
Laplace distribution 

Proposition 25 (One-sided Laplace problem) Assume that the pdf of the test statistics is /o : 
X i~> ^e^'^' under the null hypothesis, and /i : a: i— > ^e^l^^^l under the alternative, with 6 > 
(one-sided test). Then 

1. The one-sided p-value function is 

'ie(-l^l) ifx>0 
l-ie(-l=^l) ifx<0 



1 - Fo{x) 



2. The inverse one-sided p-value function is 



(l^Fo)-Uu)^r^^-^ ./0<.<i 
^""^ 1hr(2(l~«)) ^fl<u<l 



3. The cdf of one-sided p-values under "Hi is 



j/ < u < V 

-0 



Gt{u)^^l-±^e-^ ^f^<u<l 
1-(1-M)e-^ ifu>^ 
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4- The pdf of one-sided p-values under "Hi is 




, -8 



«/ < u < V 

Proof [Proof of Proposition 25] The inverse p-value function directly follows from the 
p-value function and the pdf of the p-values follows from the cdf, so we only prove (1) 
(p-value function), and (3) (cdf of the p-values). 

Proof of (1). We have /o(a;) = ^e-I^^L Therefore, for x < 0, Fo{x) = J^^ \e^dt = 
ie-l^l. For x > 0, Fq{x) = \e*dt -t- /q" \e-*dt = 1 - ie'l^^l 



Proof of (3). Let u £ [0, 1]. The distribution function of one-sided p-values is given by 

Giiu) = Fg {I - Foix) < u) 

= Pe > (1 - i^o)"' (^^)) 

fi{x)dx + / fi{x)dx 



l-Fo(u) J9 



l-Fo(u) 



K-\-^\dx + - 
2 2 



For w < i, (1 - Fo) ^ (u) = In ^ and (1 - Fq) ^ (u) > 9 <^ u< 



Hence if w < ^-2- , 



In 4t: 



1 1 

2^2 



e-(i"^-«)-(-l) 



If ^ < U < i. 



Gi{u) = 1+ [ U'-'^dx 



2 7ln^2 

1 1 

2 + 2 

4u 



Finally, for u > i, (1 - Fa)~^ (u) = ln2(l - u). Thus (1 - Fo)~^ (u) < 9 u > 

1 — which always holds for u> ^ because 9 > 0. 
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Hence for li > i, 



1 _ gln2(l-«)-9 



2 Jln2(l-ii) 2 

1 1 

2 + 2 

1 - (1 - u)e 



Student distribution 

Lemma 26 (Non-central Student distribution) The density function of a non- central Student 
distribution with k degrees of freedom and non centrality parameter 9 may he written as 



flit) = — exp 

2— r |V^ (i + f)~ 



1 



et 



where 



Hhk{z) 



+00 k 

^e-i^^+^^'dx. 
k\ 



Proof Let T^^e = ^u/k' ^^^'^^ -^{(^^ 1) ^^^d U ^ x^{k), with Zg and U independent. 

We have = ^ (P {Tkfi <t)) = j-^ {Zg < ty^ujkjy As and U are independent, 
we have 



> {t./^-e^ fu{u)du 



Thus, inverting /jj and ^, 
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Then, using the transformation w = w (1 + -jr)u, we note that 



exp 



cxp 



cxp 



\ V 

2 



i/Vk ' 



k-l 
U 2 du 



„fc-l 



2v 



-dv 



v~et 



exp 



exp 



2 1 + ^ 



1/Vk 

2 



2 1 



2 



exp 



1 +t2/fc 



2w* 



-klHh, 



Therefore 



hit) - 



1 



1 



/2fc7r 2tr(A:/2) 
2 



cxp 



2 1 + ^ 



-9t 



which completes the proof because V{k + 1) = k\ ■ 

Proof [Proof of Proposition 9] With notation of Lemma 26, the probabihty distribution 
function of central t with k degrees of freedom is given by 



r(A: + l) 



1 



— H^^k (0) , 



and the likelihood ratio of the model is given by 



/o 



cxp 



2 1 + ^ 



0t 



/k+W 



HhkiO) 



The following property of Hhk is useful to prove that is non-decreasing 



Lemma 27 



H'k+iiz) 



-Hlikiz) 
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Proof [Proof of Lemma 27] Let fc e N. As Hk+i{z) = J^°° jl^p^e" 3(^+2)^^0;, we have 



+00 ^k+i 



(fc + 1)! 



(fc + 1)! 
0-Hhk{z) 



+00 p+00 



(fc + 1)! 



dx 



Proof [Proof of Proposition 10[ 



1. As t i-> exp 



"2 i+* 



is non-decreasing and < 1— > — 



0t 



is non-increasing, it is sufficient 



to prove that Hhk is non-increasing, which fohows from lemma 27 because Hk-i is positive. 
2. by proposition 5 it suffices to note that 

t^+^fo^' mfc(o) 



Proof of Proposition 11 

Proof [Proof of Proposition 11[ By the definition of n„i we have, for ant a > 0, 

TT / N 1 - V,n{a)/Rmia) „ , . , 
lim(a) = Rm(a)/m. 

1 - TTq 

When a > a*, the hmit in probability of V^(a)/i?m(Q^) is the asymptotic FDR achieved by 
the BH95 procedure, that is, TTga (Chi, 2007a). When a < a*, the proportion of rejections 
by the BH95 procedure is asymptotically bounded (Chi, 2007a), so that both Boo and poo 
converge to in probability. ■ 



Appendix C. Proofs of section 4 

Consistency, purity and criticality 
Proof [Proof of Lemma 12[ We note that 

Mx) _ fo{x-e) 



fo{x) /o(x) 

foi-x^ 



M-x) 

fo{-x + t 



by definition of a location model 
- by symmetry of / 



hi-x + e) 

which concludes the proof, as is a fixed scalar. 
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Proof [Proof of Proposition 13] We have a* = linit,_j.o where g = ttq + (1 — no)gi and 



Jo 



/o 

Therefore, as hm„^o = +00, the result is a consequence of Lemma 12. ■ 

Storey's estimator with A — )■ 1 

Proof [Proof of Proposition 15[ First, we note that As G(A) — ttqA + (1 — 7ro)Gi(A), we 
have, for any A < 1, 

1 - G(\) ^ l-Gi(A) 

^^==-o + (l-vro)^-^. (10) 

1. We demonstrate that 7ro(A,„) may be written as a sum of m independent random variables 
that satisfy the Lindeberg-Feller conditions for the Central Limit Theorem (Pollard, 1984). 
Let Z,™ = Ip.^i^h^^, where the Pi are the p-values. Z"' follows a Bernoulli distribution with 
parameter pm = 1 — G(l — hm)- Denoting 

zr-E[zr] 



y/rnh. 
we have 



^r," = y;^(^o(l-/^™)-E[^o(l-/jrn)]) . 

i=l 

(Fj'")i<i<,„ are centered, independent random variables, with VarKj™ = ^^^^ = G(i fe,nKi-G(i-ft,ni)) ^ 
which, by (10), is equivalent to ^ as to — > +00. Therefore, 

m 

lim Y.^[{Yrf]^lf^. 
Finally we prove that for any e > 0, 

rn 

lim J2^[{Yn'l\Yr\>e] =0- 

i=l 

As Z™ e {0, 1} and E [Z,™] G [0, 1], we have (y™)^ < and 

771 _j 

5:E[(r™)2ii^^.|>,] < — E[i|^_™|>,] 

z— 1 

= -Lp(|r™|>e) 

1 VarK" 
< 



by Chebycheff's inequality. As mh,n — ?► +oo and Var Yj^ ~ as m — >■ +cx3, the above sum 
therefore goes to as to/i^ — s- +oo. The Lindeberg-Feller conditions for the Central Limit 
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Theorem are thus fulfiUed, and we have 

m 

which concludes the proof because X]I=i ^ V mhm (7ro(l — h„i) — E [^o(l ^ ^m)])- 
2. By (10); the bias is given by 

E[7ro(A)]-7ro = (l-^o)^-^^. 
A Taylor expansion as A ^ 1 yields 

l-Gi(A) = ^tlM!(i)(i-A)'+i+o((l-A)'+i) 
1=0 ^ '' 

= (1 " A)5i(l) + - >^?^' + o ((1 - A)'=+i) 

as we assumed that ^^''(l) for 1 < i < fc and ^^'^''(1) ^ 0. Therefore, if /i„i — )• as 
m — > +00, we have 

E [7ro(l - hra)] - TffJ = (1 - ) ^^^^^^ + O {hi) , 

which concludes the proof. ■ 



Proof [Proof of Proposition 16] For (1) we begin by noting that by Proposition 15, the 
asymptotic variance of 7ro(l — hm) is equivalent to . As we assumed that the first 

k — \ derivatives of gi at 1 are nuh, and that ^^^(l) 7^ 0, a Taylor expansion of 7ro(l — 
hm) ~ 7^0 ensures that the bias is of order /ij^. The optimal bandwidth is obtained for 
hm proportional to m~ ^fc+i , because this choice balances variance and squared bias. The 
proportionality constant is an explicit function of fc, ttq, and ^^'^^(l). By definition, the 
MSE that corresponds to this optimal choice is twice the corresponding squared bias, i.e. 
of order m~ 2^+1 , which completes the proof of (1). 
To prove (2), we note that 

V mhm (tto - W) = V mhm (^0 - E [ttq]) + \/mhm (E [ttq] - Tfij) , 

where ttq denotes 7ro(l — hm) to alleviate notation. The first term (variance) converges 
in distribution to A/'(0,7fo) by Proposition 15 (1) as soon as y/mJi^ — > +00. The sec- 
ond term (bias) is of the order of ^mhmhl = \J mhm^^ by Proposition 15 (2). Taking 
hm{k) = h^{k)T]'^, where rjm — > 0, we have mh'^'^^ — >■ 0, which ensures that the bias term 
converges in probability to 0. ■ 



Asymptotic FDP for plug-in procedures 

Lemma 28 With assumptions of Proposition 20, r converges almost surely to r* as m — > +00. 



35 



p. Neuvial 



Proof [Proof of Lemma 28] Let ipF.-^ ■ u n- F{u) ~ uf-f for any distribution function F 
and any 7 G (0, 1]. As G,„(r) = t^qt ja and G'(t*) = 7foT*/a, we have ^/'g-q/ws-I'''*) ~ and 
a/ffo ~ "^^^ ^'^^^ °^ proof is to note that 

1- i'Caji^^) converges almost surely to = t\) g .a 

2. ipG,a/Tf^ is locally invertible in a neighborhood of r*. 

To prove (1), we note that 

(G- + (Gm(r) -TTor/a) + (ttq -71^)^/0. 

The first terms converges to almost surely, the second is identically null, and the third 
converges almost surely to because ttq converges in probability to ttq, and t G [0, 1]. Item 
(2) holds because we are in a subcritical situation: a > Tfoa*, with a* = lim„_>o w/G(it) 
(see (Neuvial, 2008, Lemma 7.6 page 1097) for a proof of the invertibility) . Combining (1) 
and (2), t converges almost surely to t*. ■ 



Proof [Proof of Proposition 20[ We only give the proof for r, as the proofs for v and p are 
quite similar. The idea is that the fiuctuations of Gm — G, which are of order l/^/m by 
Donsker's theorem (Donsker, 1951), are negligible with respect to the fluctuations of ttq — ttq, 
because these are assumed to be of order 1 / yjmhm with hm — ?> 0. Letting G,„ = Gm — G 
be the centered empirical process associated with G, we have 

G(?)-G(t*) = (G(f)-G,„(f)) + (G,„(f)-G(r*)) 
= G„i(r) + {TTor/a - t^t* /a) 

because Gimij) — t^viT /a and G{t*) = Wqt* /a. Therefore, 

G(f) - G(r*) = G,„(r) + ^(f - r*) + ^^^r* . 

a a 

As ||Gm||oo ^ c-\/ln lnm/?7i and hm = o(l/lnlnm), we have G„i(t) = op {1/ \/mhm). 
Since r r* as m +00, we also have G(t) — G{t*) — (t ~ T*){g{T*) + op (1)) by 
Taylor's formula. Hence we have 



9{t*) {t-t*) = {ttq - TTo)— r* +Op fl/v/m/i^ 



Finally, as 



\/mhm (tto - ti-q) --^ A/'(0, iKt^o)) , 

we have 

- ' (7ro-W)(l + op(l))^ 



g{T*) - TTo/a 
which concludes the proof for r. 



Proof [Proof of Theorem 21[ By Proposition 20, we have 

^^(©-(w"/*o))-^«'-'') 
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with 



We write FDP ~ 7(1/, p), where 7 : (a;, y) i— > x/y for any a; > and y > 0- 7 is differentiable 
at {ttot* ^ttqt* /a), with derivative 

As jIttqt* ,TfoT* /a) = the Delta method yields 



\fmh^ (^FDP™ -^]^M{G,w), 



with 



r'^/a \ \ f '^0 



w(7ro)/7ro 



2 



5r(r*) - TTo/c 



. ^ 7roQ!2 — 



Proof of Proposition 23 

Lemma 29 (Regularity of gi in two-sided symmetric models) Under Condition 2, if the like- 
lihood ratio ^ is differentiable on M., then the density of the p-values under the alternative 
hypothesis is differentiable on (0,1] and verifies, for any u G (0, 1], 

- 4M.„-;-„/2)) (iS) - "/^» - (f )' 

Proof [Proof of Lemma 29] By Proposition 2, we have 

5? '''M = \ (|(9o (V2)) + |(-go(«/2))) , 
where : ^(7^(1 — u)- Therefore, 

which concludes the proof as (70 satisfies q{){l — u) = —qa{u) and (7o(u) = —I//0 (go (""))• ' 
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Proof [Proof of Proposition 23] (1) was already stated in Proposition 14. By Lemma 29, 
we have gf^^\l) = 0, as go(l/2) = Fo"^(l/2) = 0, which proves (2). For (3), note that 
if ^ is twice differentiable in a neighborhood of 1, then by Lemma 29 is itself twice 

differentiable. Writing g^'"^\u) = a{u)b{u), with 

(aiu) =l/(4/o(Fo-i(l-V2))) 

\b{u) = {^)\f-\1 u/2)) ^^)\f-\u/2)) ' 

we have ^^V) = a'{u)b{u) + a{u)b'{u). As qo{l/2) ^ Fq\1/2) = , we have b{l) = 0, 
so that gf^'^\l) = a(l)6'(l), where 

b'(u) = f f — V ' (F^\l - u/2)) + f ■* (F-\u/2))] . 

^ ' 2UF^\l-u/2)) \\fo) ^ " ^ ' \k) ^ ° ^ ' "j 

Thus, a(l) = l/(4/o(0)) and 6'(1) = l/(2/o(0)) x 2^(0), which concludes the proof. ■ 
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